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ABSTRACT 

Assuming that approximate registration is given within a few pixels by a systematic correction system, we develop 
automatic image registration methods for multi-sensor data with the goal of achieving sub-pixel accuracy. Automatic image 
registration is usually defined by three steps; feature extraction, feature matching, and data resampling or fusion. Our 
previous work focused on image correlation methods based on the use of different features. In this paper, we study different 
feature matching techniques and present five algorithms where the features are either original gray levels or wavelet-like 
features, and the feature matching is based on gradient descent optimization, statistical robust matching, and mutual 
information. These algorithms are tested and compared on several multi-sensor datasets covering one of the EOS Core 
Sites, the Konza Prairie in Kansas, from four different sensors; IKONOS (4m), Landsat-7/ETM+ (30m), MODIS (500m), 
and SeaWIFS (1000m). 

Keywords: image registration, multi-sensor, wavelet representation, optimization, Hausdorff distance, mutual information 


1. INTRODUCTION 

Remote sensing challenges include predicting regional climate change, understanding interactions between human 
activity and the changes in the major Earth ecosystems, and processing data acquired by formation flying systems. For each 
of these applications, creating continuity of the data through integration and seamless mosaicking of multiple sensor data, as 
well as extrapolation among several scales, temporal, spatial, and spectral, are key components. Very accurate registration is 
the first requirement of these components. Automatic image registration and fusion also represent intelligent technologies 
that would reduce mission costs. On-board image registration and fusion would enable autonomous decisions to be taken 
on-board, and would make formation flying adaptive, self-reliant, and cooperative. For planet exploration, in-situ automatic 
image registration and fusion would enable a robot or a fleet of robots to navigate and explore remote planets, analyzing 
multiple sensor data, building a map of its environment, and making intelligent decisions about planning its path. 

While navigation often refers to “systematic correction”, image registration refers to “precision correction.” The 
systematic correction is model-based, while precision correction is feature-based. Starting from the results of the systematic 
correction (usually accurate within a few pixels), precision-correction utilizes selected features or control points to refine 
the geo-location accuracy within one pixel or a sub-pixel. In this work, we will focus on precision correction or automatic 
image registration. Currently, there is a large quantity of potential image registration methods that have been developed for 
aerial or medical images and that are applicable to remote sensing images' 2 . But there is no consolidated approach that 
would help a remote sensing user or designer in choosing the method the most adapted to his/her application or system 
development. The intent of our work is to survey, design, and develop different components of the registration process and 
to evaluate their performance independently on well-chosen multi-sensor data. Our previous work has focused on 


correlation-based methods 3 and various types of wavelets 4,5 . In this work, we are looking at different similarity metrics and 
strategies. 

Section 2 gives a general definition of image registration followed by a detailed description of the five different 
algorithms that are considered in this study. Section 3 presents the results obtained when the five methods are applied to 
data acquired by the four sensors, IKONOS, Landsat-7/ETM+, MODIS, and SeaWIFS, over one of the EOS Core Sites, the 
Konza Prairie in Kansas, USA. 


2. IMAGE REGISTRATION 


2.1. General Definition of Image Registration 

Within the context of satellite data geo-registration, feature-based, precision-correction or automatic image registration 
of satellite image data is defined as the process which determines the best match of two or more images acquired at the 
same or at different times by different or identical sensors. One set of data is taken as the reference data, and all other data, 
called input data (or sensed data), is matched relative to the reference data. The general process of image registration 
includes three main steps: 

1) the extraction of features to be used in the matching process, 

2) the feature matching strategy and metrics, and 

3) the resampling of the data based on the correspondence computed from matched features. 

For some applications, step (3) is replaced by an indexing of the incoming data into an absolute reference system, such 
as (latitude, longitude) for an Earth reference system, or by a fusion process. This paper only deals with steps (1) and (2). A 
large number of automatic image registration methods have been proposed and surveys can be found in * . Some of the 
features that are being utilized for step (1) are: original gray levels, edges, regions, and more recently wavelet features. 
According to Brown 2 , step (2) itself can be separated into: 

• the search space , i.e. the class of potential transformations that establish the correspondence between input data and 
reference data. Transformations that are often used are rigid transformations (composed of a scaling, a translation and a 
rotation), affine transformations (composed of a rigid transformation, a shear and an aspect-ratio change), and 
polynomial transformations. 

• a search strategy, which is used to choose which transformations have to be computed and evaluated. Local or global 
search, multi-resolution search or optimization techniques are examples of various search strategies. 

• a similarity metric, which evaluates the match between input data and transformed reference data for a given 
transformation chosen in the search space. Correlation measurement has been the most often used but other methods such 
as Mutual Information 5 or a Hausdorff 6 distance can also be utilized. 

Our previous work 3 has mainly been dealing with features such as gray levels, edges, and orthogonal wavelet 
coefficients, and matching them using cross-correlation as a similarity metrics. This study intends to pursue and extend this 
work, by looking at other feature matching strategies and metrics. 

2.2. Description of Previous Correlation-Based Experiments 

Using cross-correlation as a similarity metrics, features such as gray levels, edges, and Daubechies wavelet coefficients 
were compared using mono-sensor data 3 . When using gray levels as features, our previous work evaluated their matching 
using either a basic spatial correlation or a phase correlation (computed as the phase of the cross-power spectrum of the two 
images which only measures translations). When using edge features , we perform the registration in an iterative manner, 
first estimating independently the parameters of the deformation transformation on a 64x64 portion extracted from the 
center of the two images, and then iteratively refining these parameters using larger and larger portions of the images. For 
this implementation, the transformation was modeled as a combination of a scaling in both directions, a rotation, and a shift 
in both directions. Wavelet features were also extracted after decomposing both images with a discrete orthonormal basis of 
wavelets (Daubechies' least asymmetric filters 7 ) in a multi-resolution fashion. Low-pass features, which provide a 
compressed version of the original data and some texture information, and high-pass features, which provide detailed edge- 
like information, were both considered as potential registration features. They were matched by following the multi- 
resolution framework of the wavelet decomposition, with a first transformation estimated on small low-resolution images 
and iteratively refined on larger and larger images of higher and higher spatial resolutions. 

The first evaluation 3 was performed using three datasets; two "synthetic datasets" for which the true transformation 
parameters were known, and one dataset for which no ground truth was available but manual registration was computed. 
Using cross-correlation as a similarity metrics, all previous features were evaluated for registration purposes, using accuracy 
and computation times as evaluation criteria. Results showed that, as expected, edges or edge-like features like wavelets - as 



opposed to gray level features - are more robust to noise, local intensity variations or time-of-the day conditions than 
original gray level values. On the other hand, when only looking for translation on clear data, phase correlation provides a 
fast and usually accurate answer. Comparing edges and wavelets, wavelet-based registration is usually faster than an edge- 
based registration, but edges provide more accurate answers than the orthogonal wavelet-based registration. This lack of 
accuracy is mainly due to the lack of translation- invariance of orthogonal wavelets. Although this first evaluation was very 
preliminary due to the limited amount and type of test data, it indicated that the choice of each component of a registration 
scheme is a trade-off between accuracy and timing and is dependent on the type of data to be registered. 

2,3. Five Image Registration Algorithms 

Our previous work focused on correlation-based methods. Although correlation measurement is one of the most 
common similarity metrics used in registration, it is computationally expensive and noise sensitive when used on original 
gray level data. Using pre-processing such as edge detection or a multi-resolution search strategy enables large reductions in 
computing time and increases the robustness of the algorithms. Other computational speed-ups are also obtained by using 
Fast Fourier Transforms (FFT’s). In our previous work, cross-correlation was used with an exhaustive search. For each 
possible transformation, cross-correlation of the transformed reference intensity, edge or wavelet images and of the input 
reference intensity, edge or wavelet images was computed. Then the maximum of all correlations was selected and the 
corresponding transformation was chosen as the best transformation. One of the main drawbacks of this method is the 
prohibitive computation times when the number of transformation parameters increases (e.g., affine transformation vs. shift- 
only), or when the size of the data increases (full size scenes vs. small portions, multi-band processing vs. mono-band). 

To answer some of these concerns, we are therefore investigating different types of similarity metrics, such as the 
partial Hausdorff distance 6 and Mutual Information 8 , and on different types of feature matching strategies, such as the 
Gradient Descent optimization methods (widely used for medical imagery 9 ) and robust feature matching methods 6 . 

2.3.1. Feature Extraction 

In this study, two types of features are being utilized, original gray levels and wavelet features. 

Gray Levels - Gray levels are considered as potential features and are combined either with Fast Fourier Transforms or 
with gradient descent methods 11 . The main challenge in utilizing gray levels for multi-sensor registration is to deal with 
different radiometries provided by multiple wavelengths, for which high-frequency edge or wavelet features might be more 
reliable. Below, we describe both feature extraction and feature matching techniques that are investigated in this study. 

Wavelets - The advantages of using a wavelet decomposition are threefold: 

(a) by using multi-resolution, one can bring different spatial resolution data to a common spatial resolution without losing 
any significant features, which is very useful for multi-sensor data, 

(b) by utilizing high-pass information, features similar to edge features are correlated in the registration process. When these 
features are extracted at a lower resolution of the decomposition, weak and noisy higher resolution features are eliminated. 

(c) by adopting the multiresolution structure of various wavelet decompositions, one can accelerate registration by 
computing cheaply an initial approximation of the desired transformation at a coarser level combined with recursive fine- 
tuning of that approximation at finer levels. 

But as the previous evaluation showed, orthogonal wavelets lack the property of translation- invariance: according to 
the Nyquist criterion, in order to distinguish between all frequency components and to avoid aliasing, the signal must be 
sampled at a frequency that is at least twice the signal’s highest frequency component. Therefore, as Simoncelli et al 
pointed out, “translation invariance cannot be expected in a system based on convolution and sub-sampling.” When using a 
separable orthogonal wavelet transform, information about the signal changes within or across sub-bands. By lack of 
translation invariance, we mean that the wavelet transform does not commute with the translation operator, and similar 
remarks can be made relative to the rotation operator. Following these remarks, we conducted two studies: 

(1) the first study 13 quantitatively assessed the use of orthogonal wavelet sub-bands as a function of features’ sizes. The 
results are summarized below: 

• the low-pass sub-band is relatively insensitive to translation, provided that the features of interest have an extent at least 
twice the size of the wavelet filters. 

• the high-pass sub-band is more sensitive to translation, but the peak correlations are still high enough to be useful. 

(2) the second study 5 investigated the use of an overcomplete frame representation, the “Steerable Pyramid”. It was shown 
that, as expected and due to their translation- and rotation- invariance, Simoncelli’s steerable filters perform better than 
Daubechies’ filters. Rotation errors obtained with steerable filters were minimum, while translation errors exhibited a 
periodicity of half the size of the filter, independent of rotation size or noise amount. Noise studies also reinforced the 
results that steerable filters show a better robustness to larger amounts of noise than orthogonal filters. 



2.3.2. Feature Matching 

Gray levels and wavelet features are then matched with various similarity metrics and feature matching strategies. 

2.3.2.1. Similarity Metrics 

Correlation - Our approach to image correlation involves the use of Fourier techniques in place of exhaustive search to Find 
the correlation peak very efficiently 10 . Remark that this method only searches for translations. The key idea is that the 
normalized correlation coefficient as a function of relative translation position reduces to a function of vector correlations in 
place of summations. The computation of the correlation coefficient then requires four Fourier transforms of real vectors 
and three inverse Fourier transforms to real vectors. See Stone 10 for more details. 

Mutual Information - The concept of mutual information represents a measure of relative entropy between two sets, which 
can also be described as a measure of information redundancy 8 . Mutual information has been extensively studied for 
medical imagery. From this definition, it can be easily shown that the mutual information of two images is maximal when 
these images are geometrically aligned. Therefore, in the context of image registration, mutual information is utilized as a 
similarity measure that, through its maximum, indicates the best match between a reference and an input image. Preliminary 
experiments 14 have shown that, in this context, mutual information enables to extract an optimal match with a much better 
accuracy than cross-correlation, and that it can be applied successfully to the registration of remotely sensed imagery, 
integrated in the multi-resolution framework of a wavelet decomposition and using an exhaustive search. This matching 
strategy is currently being replaced by an optimization technique. 

Partial Hausdorff Distance - We also consider a well-known robust measure of similarity, called the partial Hausdorff 
distance 23,24 . Consider the set of distances resulting from taking each point in one set, and finding the nearest point to it in 
the other set. Rather than taking the sum or the maximum of these distances, which may be affected by outliers (caused by, 
e.g., missing or occluded data), we consider the median or, in general, the £-th smallest distance. More formally, given two 
point sets A and 5, and a parameter k y 1< k < \A\ y we define the directed partial Hausdorff distance from A to B to be 

H k (A, B) = K ,h a in A min b in B dist (a,b), 

where K lh returns the k iU smallest element of the set, and where dist(a,b) is the Euclidean distance from a to b. The 
parameter k is typically based on a priori bounds on the number of points of A that are expected to have close matches in B 
under the optimum transformation. These are the inliers. 

2.3.2.1, Matching Strategy 

Gradient Descent Optimization - The gradient descent algorithm was based on work by Lucas and Kanade 15 in 1981, Irani 
and Peleg 16 in 1991, also described in Keren, Peleg and Brada 17 , and Thevenaz, Ruttimann and Unser 9 in 1998. Common to 
all three papers is image registration by iterative solution to least squares equations. Our implementation combined selected 
elements of each into one framework. 

The common approach starts by formulating image registration as a least squares minimization of the L2 norm 

£(/>) = X(/-<2„(g)) 2 

where E(p) is the error as a function of the parameters p, f and g are the reference and target images, and Q p is the 
transformation as applied to the target image g. From this formation of the error the standard least squares normal equations 
can derived with the complication of linearizing the transformation Q p by Taylors series expansion. Once this is done for the 
case p consists of translation in x and y, and rotation in 0, then the normal equations become the linear system, Ax=b 
defined as: 
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with f x and f y the spatial derivatives of the reference image f and R = xf y -yf x an approximation to the rotational derivative. 

The assumptions behind the linearization of the transform and the rotational derivative mean this equation is only valid 
for small transformations, so to solve for larger displacements two techniques are used. First, the equation is solved 
iteratively, a small step at a time, until it converges. Each iteration solves for a subpixel displacement that accumulates. 
Second, the equation can be used in a hierarchical, pyramid fashion by solving for the transformation on a reduced image 
where multipixel displacements become small, and then projecting the recovered solution on the next larger level. 


For one level in the pyramid the algorithm is the following: 



• Input: A reference image f and a target image g 

• Output: A parameter vector p with the recovered transform 

• Preprocessing: compute the image derivatives and the matrix A on the reference image f. These values do not change 
during the iterations unless a weighting term is used to vary pixel contributions to the final sum. 

• Step I . Warp or transform the target image g by the current best estimate of the parameters. 

• Step 2. Compute a mask based on the warping to mark where the image g overlaps the reference image f. 

• Step 3. Compute and sum image derivatives and differences on the overlapped area to compute the vector b. 

• Step 4. Solve the linear system for an incremental change in p and repeat the process from step 1 until convergence. 

Robust Feature Matching - This method is based on the principle of point mapping with feedback 2 . Specifically, given a 
set of control points in the reference image and a corresponding set of points in the sensed image within a pre-specifled 
transformation (e.g., rigid, affine), the method derives a computationally efficient algorithm to match these point patterns. In 
principle, our proposed method is similar to previous work 18,19 , whereby matching is achieved via cluster detection in 
transformation space. Whereas all of the above methods are computationally intensive (their running time is 0(N 4 ), where 
N denotes the number of control points), our proposed methodology is expected to yield much faster variants. An outline of 
our proposed algorithmic methodology consists of the following: 

1. Monte Carlo sampling of control points. 

2. Application of robust similarity measures (e.g., partial Hausdorff distance or k-th smallest squared distance to nearest 
neighbor). 

3. Searching the transformation space through hierarchical spatial subdivisions. 

4. Pruning the search space by ’’range" similarity estimates (bounds). (Note, a range of transformations corresponds to a 
(hyper) rectangle in transformation space.) 

5. Employment of fast data structures for nearest neighbor and range queries in image space. 

This method was tested on mono-sensor imagery artificially translated and rotated produced very promising results 6 . 
Furthermore, it was recently tested successfully in an end-to-end registration scheme aimed at geo-registering real, multi- 
band Landsat data 20 . 

In summary and as represented in Figure 1 by various thickness arrows, this study investigates five image registration 
methods, which combine in various ways the components described previously: 

• Method 1 (GC): Gray Levels matched by Fast Fourier Correlation Methods 

• Method 2 (GGD): Gray Levels matched by gradient descent 

• Method 3 (WCE): Simoncelli wavelet features matched by exhaustive search of the correlation maximum 

• Method 4 (WMIE): Simoncelli wavelet features matched by exhaustive search of the mutual information maximum 

• Method 5 (WHR): Simoncelli wavelet features matched by robust feature matching using a partial Hausdorff distance 

Features Metrics Strategy 



Wavelet Features ^ Partial Hausdorff Distance ! ^ Robust Matching 


Figure 1 

The Five Image Registration Algorithms of this Study 


3, RESULTS 


3.1. Description of the Test Datasets 

The dataset used for this study represents multi-sensor data acquired by four different sensors over one of the MODIS 
Validation Core Sites. The site is the Konza Prairie in the state of Kansas, in the Middle West region of the United States. 
The four sensors and their respective bands and spatial resolutions involved in this study are: 

• IKONOS Bands 3 (Red) and 4 (Near- Infrared), spatial resolution of 4 meters per pixel. 













• Landsat-7/ETM+ Bands 3 (Red) and 4 (Near-Infrared) , spatial resolution of 30 meters per pixel, 

• MODIS Bands 1 (Red) and 2 (Near Infrared), spatial resolution of 500 meters, per pixel, 

• SeaWTFS Bands 6 (Red) and 8 (Near Infrared), spatial resolution of 1000 meters per pixel. 

Since most of the algorithms considered for this study do not yet handle scale, we initially re-sampled the IKONOS and 
ETM+ data to the respective spatial resolutions of 3.91 and 31.25 meters, using the commercial software, PCI. This slight 
alteration in the resolution of the data enables to obtain similar spatial resolutions by performing the decimation step of the 
wavelet transform. Overall, we consider eight different images corresponding to different bands of different sensors. Each 
of these images is also pre-processed so that its dimensions in x and y are multiples of 2 L , where L is the maximum number 
of wavelet decomposition levels used in the registration process. After pre-processing, we have the following dataset: 

• IKONOS images: "iko red 3.91 .power" and "iko_nir_3. 91. power", size 2048 columns by 2048 rows, 

• ETM images: "etm_red_31 .25. power" and "etm_nir_31 ,25. power", size 7552 columns by 6784 rows, 

• MODIS images: "modis_day249_cc_red.power" and "modis_day249_cc_nir.power", size 776 columns by 392 rows, 

• SeaWIFS images: ,, seawifs_day256 red.power ,, and "seawifs day256_nir.power", size 472 columns by 456 rows. 

Figure 2 shows one band of each of these scenes. 

3.2. Results 

Table 1 shows the pairs of images registered by the five algorithms. For each pair, we used the UTM (Universal 
Transverse Mercator) coordinates of the upper left scene corner to extract windows in the larger sensor scene that 
correspond to smaller sensor scenes. For example, a window was extracted in the Landsat scene that is registered to the 
IKONOS scene. Similarly, windows are extracted in the MODIS scene to be registered to the Landsat scene, and windows 
are extracted from the SeaWIFS scene to be registered to the MODIS scene. Then, the Simoncelli steerable pyramid is 
applied to the highest resolution images to bring them to the same spatial resolution than the lowest spatial resolution 
images of each pair, through down-sampling. This decomposition is then pursued up to four levels that provide registration 
features for Methods 3,4, and 5. After wavelet decomposition, a masking process is applied that eliminates border effects 
due to the filter convolution operation. 

Table I shows the results obtained by the five algorithms for seven pairs of multi-band or multi-sensor images. All 
results obtained by the 5 algorithms are similar within 0.5 degrees in rotation and within 1 pixel in translation. For pair (1), 
due to the very large size of these images that implies very large memory requirements, seven sub-windows were manually 
extracted from both bands and the results of these sub-windows registrations all show a rotation of 0 degree and a 
translation in pixels of (0,0). The only pairs for which we performed manual registration are pairs (2) and (3), IKONOS to 
ETM data, and found the following transformation; rotation^ degrees, tran$lation=(2,0) pixels. Most of all the algorithms 
agree with this ground truth. 

3.3 Discussion and Future Work 

Future work will include detailed quantitative inter-comparison of these five algorithms. For such a comparison, and in 
the general framework of on-board automatic image registration, we feel that the most important criteria and the first ones 
that we will consider are the following: 

• reliability of such automatic algorithms, and especially the capability to compute a “confidence measure" of the 
registration methods, 

• sub-pixel accuracy of the registration, 

• low computational requirements (computation time, memory, storage). 

Algorithm confidence measurement - The use of a fully automatic registration algorithm, particularly on-board an 
independently operating spacecraft, leads to the question of how to identify gross errors when conditions cause the 
algorithm to fail. A measure of registration confidence would be useful to detect failure modes. Typically, registration 
algorithms minimize a measure of image difference to find a solution. These algorithms can get trapped in local minimums 
or find a solution when none should exist. Ground truth can allow for testing of algorithm performance in controlled cases, 
but in operational use external ground truth is not available. Internal image statistics can be used to develop a measure of 
confidence in a registration solution. In a highly idealized case, the difference image between the two output registered 
images should be based entirely on the noise parameters of the image source. If the difference image exhibits regular 
patterns, or textural measures not typically of the noise distribution, then we would suspect the images are not properly 
registered, or there are scene changes that could corrupt the registration solution. The key idea here is that, while doing 
registration we use computationally inexpensive measures to allow rapid registration. After registration is complete, we can 
apply more expensive analysis to verify that the image differences fit a priori statistical models. Despite the importance of 
this issue to the system engineering of a fully automatic registration algorithm, little has appeared in the literature on how to 
verify solutions except for known feature point matching 21 . 



Registration Accuracy - Several methods can be thought of to quantify the accuracy of a given registration method: 

(1) a first method consists of registering the same set of data manually and automatically. Then, considering the manual 
registration as our “ground truth”, the error between manual and automatic registration characterizes the accuracy of the 
automatic registration. 

(2) when GPS (Ground Positioning System) data are available, a few very accurate Ground Control Points can be selected 
and their UTM or (Latitude, Longitude) coordinates utilized to compute the accuracy of the registration. 

In general, when performing multi-resolution data registration, the goal is to register the images at the fine resolution 
rather than the coarse resolution. In preliminary experiments that will be pursued in future work 22 , the main finding is that 
the resolution attainable is probably between these two original resolutions, at about 1/8 of the pixel size of the lowest 
resolution. Results show that the down-sampling of the high-resolution images plays a role in determining the precision of 
the registration. This issue will be investigated further in future work. 

Computational Requirements - Image registration is a time-crucial and computationally demanding process. The 
computational requirements of each method will be computed by two means: (1) the computational complexity of each 
algorithm will be evaluated, and (2) the implementation of each method on target architectures will be investigated. 
Furthermore, we will investigate memory and storage usage of the algorithms as well as input/output (I/O) requirements. 


4. CONCLUSIONS AND FUTURE WORK 

The study presented in this paper compares registration results provided by five different algorithms utilizing gray 
levels and wavelet features combined with correlation, mutual information and partial Hausdorff distance as similarity 
metrics, and Fourier Transforms, exhaustive search, gradient descent, and robust feature matching as search strategies. 
Results are very similar and are within one pixel of each other for most registrations. Future work will involve quantitative 
comparison of these results, involving a larger dataset, systematic ground truth, and accurate measurements of sub-pixel 
accuracy and computational timings. Future coarse-grained implementation of these algorithms in a high performance 
parallel computing environment will also be investigated. 
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Table 1 

Results Obtained by The Five Image Registration Algorithms on the Konza Prairie ( Kansas ) Multi-Sensor Dataset 
Rotations are in degrees , Translations are in pixels corresponding to the lowest resolution of the registered pairs . 
























Figure 2 

IKON OS, Landsat/ETM, MODIS and Sea W I FS Images of the Konza Prairie in Kansas, US 


